Search CORE

UCL Discovery

Warwick Research Archives Portal Repository

Empirical Bayes models for multiple probe type microarrays at the probe level

Author: A Hess
A Sjögren
A Spira
AM Hein
AP Dempster
B Efron
BP Durbin
BP Durbin
D Gaile
D Holder
DM Rocke
E Kristiansson
E Kristiansson
GK Smyth
I Lönnstedt
IA Eaves
J Comander
J Hu
JW Tukey
LM Cope
M Åstrand
MA Sartor
Magnus Åstrand
Mats Rudemo
N Jain
P Baldi
P Munson
Petter Mostad
R Opgen-Rhein
RA Irizarry
RS Stearman
S Choe
SC Geller
T Hastie
VG Tusher
W Huber
W Lemon
X Liu
X Liu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background When analyzing microarray data a primary objective is often to find differentially expressed genes. With empirical Bayes and penalized t-tests the sample variances are adjusted towards a global estimate, producing more stable results compared to ordinary t-tests. However, for Affymetrix type data a clear dependency between variability and intensity-level generally exists, even for logged intensities, most clearly for data at the probe level but also for probe-set summarizes such as the MAS5 expression index. As a consequence, adjustment towards a global estimate results in an intensity-level dependent false positive rate. Results We propose two new methods for finding differentially expressed genes, Probe level Locally moderated Weighted median-t (PLW) and Locally Moderated Weighted-t (LMW). Both methods use an empirical Bayes model taking the dependency between variability and intensity-level into account. A global covariance matrix is also used allowing for differing variances between arrays as well as array-to-array correlations. PLW is specially designed for Affymetrix type arrays (or other multiple-probe arrays). Instead of making inference on probe-set summaries, comparisons are made separately for each perfect-match probe and are then summarized into one score for the probe-set. Conclusion The proposed methods are compared to 14 existing methods using five spike-in data sets. For RMA and GCRMA processed data, PLW has the most accurate ranking of regulated genes in four out of the five data sets, and LMW consistently performs better than all examined moderated t-tests when used on RMA, GCRMA, and MAS5 expression indexes.</p

Springer

Chalmers Publication Library

Chalmers Research

Dynamic modeling of gene expression in prokaryotes: application to glucose-lactose diauxie in Escherichia coli

Author: A Haye
A Krishnan
BP Durbin
D Hekstra
H Bolouri
H Kitano
H Lodish
H Lähdesmäki
J Gebert
J Vohradsky
Jaroslav Albert
Marianne Rooman
MF Traxler
MR Maurizi
NE Buchler
P Smolen
RA Veitia
RD Leclerc
RN Gutenkunst
TS Gardner
TT Vu
W Liebermeister
Publication venue
Publication date: 01/01/2011
Field of study

Coexpression of genes or, more generally, similarity in the expression profiles poses an unsurmountable obstacle to inferring the gene regulatory network (GRN) based solely on data from DNA microarray time series. Clustering of genes with similar expression profiles allows for a course-grained view of the GRN and a probabilistic determination of the connectivity among the clusters. We present a model for the temporal evolution of a gene cluster network which takes into account interactions of gene products with genes and, through a non-constant degradation rate, with other gene products. The number of model parameters is reduced by using polynomial functions to interpolate temporal data points. In this manner, the task of parameter estimation is reduced to a system of linear algebraic equations, thus making the computation time shorter by orders of magnitude. To eliminate irrelevant networks, we test each GRN for stability with respect to parameter variations, and impose restrictions on its behavior near the steady state. We apply our model and methods to DNA microarray time series' data collected on Escherichia coli during glucose-lactose diauxie and infer the most probable cluster network for different phases of the experiment.Comment: 20 pages, 4 figures; Systems and Synthetic Biology 5 (2011

arXiv.org e-Print Archive

DI-fusion

Probe set algorithms: is there a rational best bet?

Author: B Harr
BM Bolstad
BM Bolstad
BP Durbin
C Li
C Li
DM Rocke
Eric P Hoffman
FF Millenaar
J Freudenberg
J Seo
J Seo
Jinwook Seo
JN McClintick
M Bakay
M Inoue
P Zhao
RA Irizarry
RA Irizarry
RA Irizarry
S Huang
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2006
Field of study

Affymetrix microarrays have become a standard experimental platform for studies of mRNA expression profiling. Their success is due, in part, to the multiple oligonucleotide features (probes) against each transcript (probe set). This multiple testing allows for more robust background assessments and gene expression measures, and has permitted the development of many computational methods to translate image data into a single normalized "signal" for mRNA transcript abundance. There are now many probe set algorithms that have been developed, with a gradual movement away from chip-by-chip methods (MAS5), to project-based model-fitting methods (dCHIP, RMA, others). Data interpretation is often profoundly changed by choice of algorithm, with disoriented biologists questioning what the "accurate" interpretation of their experiment is. Here, we summarize the debate concerning probe set algorithms. We provide examples of how changes in mismatch weight, normalizations, and construction of expression ratios each dramatically change data interpretation. All interpretations can be considered as computationally appropriate, but with varying biological credibility. We also illustrate the performance of two new hybrid algorithms (PLIER, GC-RMA) relative to more traditional algorithms (dCHIP, MAS5, Probe Profiler PCA, RMA) using an interactive power analysis tool. PLIER appears superior to other algorithms in avoiding false positives with poorly performing probe sets. Based on our interpretation of the literature, and examples presented here, we suggest that the variability in performance of probe set algorithms is more dependent upon assumptions regarding "background", than on calculations of "signal". We argue that "background" is an enormously complex variable that can only be vaguely quantified, and thus the "best" probe set algorithm will vary from project to project

George Washington University: Health Sciences Research Commons (HSRC)

Formation of regulatory modules by local sequence duplication

Author: A Stark
A Tanay
AL Halpern
AM Moses
AM Moses
AM Moses
Amos Tanay
Armita Nourmohammad
B Ondek
BP Berman
CM Bergman
CM Bergman
CT Harbison
D Gruen
D Stanojevic
DN Arnosti
DS Fields
E Segal
EE Hare
EH Davidson
EH Davidson
G Badis
G Benson
G Leung
GD Stormo
I Abnizova
J Berg
J Berg
J Monod
JM Hancock
K Thornton
L Li
M Kimura
M Kimura
M Levine
M Lynch
M Lynch
M Lässig
M Markstein
M Pachkov
M Ptashne
MC King
MD Vinces
Michael Lässig
MM Kulkarni
MS Halfon
MS Halfon
MV Katti
MZ Ludwig
MZ Ludwig
MZ Ludwig
MZ Ludwig
N Rajewsky
NE Buchler
O Berg
PW Messer
R Durbin
RJ Britten
RW Lusk
S Kullback
S Mukherjee
S Sinha
S Sinha
S Sinha
S Small
SJ Maerkl
SM Gallo
SW Doniger
V Boeva
V Mustonen
V Mustonen
V Mustonen
Z Wunderlich
Publication venue: 'Public Library of Science (PLoS)'
Publication date: 01/01/2011
Field of study

Turnover of regulatory sequence and function is an important part of molecular evolution. But what are the modes of sequence evolution leading to rapid formation and loss of regulatory sites? Here, we show that a large fraction of neighboring transcription factor binding sites in the fly genome have formed from a common sequence origin by local duplications. This mode of evolution is found to produce regulatory information: duplications can seed new sites in the neighborhood of existing sites. Duplicate seeds evolve subsequently by point mutations, often towards binding a different factor than their ancestral neighbor sites. These results are based on a statistical analysis of 346 cis-regulatory modules in the Drosophila melanogaster genome, and a comparison set of intergenic regulatory sequence in Saccharomyces cerevisiae. In fly regulatory modules, pairs of binding sites show significantly enhanced sequence similarity up to distances of about 50 bp. We analyze these data in terms of an evolutionary model with two distinct modes of site formation: (i) evolution from independent sequence origin and (ii) divergent evolution following duplication of a common ancestor sequence. Our results suggest that pervasive formation of binding sites by local sequence duplications distinguishes the complex regulatory architecture of higher eukaryotes from the simpler architecture of unicellular organisms

arXiv.org e-Print Archive

Public Library of Science (PLOS)

Kölner UniversitätsPublikationsServer

Making Informed Choices about Microarray Data Analysis

Author: A Kauffmann
A Ploner
A Reiner
AC Culhane
AC Eklund
BE Stranger
BM Bolstad
BP Durbin
BP Durbin
C Li
CM Perou
DK Slonim
F Bretz
Fran Lewitter
G Kerr
GA Churchill
GK Smyth
GK Smyth
GP Page
HM Kang
I Lonnstedt
J Hou
J Leek
JC Marioni
JD Storey
JD Storey
JF Ayroles
JH Do
KV Mardia
LM Cope
M Dai
M Reimers
M Reimers
M Reimers
M Reimers
M Suarez-Farinas
Mark Reimers
MC Ryan
ME Figueroa
ME Ritchie
NR Garge
R Gentleman
RA Irizarry
RA Irizarry
RA Johnson
S Dudoit
T Hastie
T Hastie
TL Fare
W Huber
WE Johnson
WK Lim
WS Branham
X Cui
Y Benjamini
YH Yang
Publication venue: Public Library of Science
Publication date: 01/01/2010
Field of study

This article describes the typical stages in the analysis of microarray data for non-specialist researchers in systems biology and medicine. Particular attention is paid to significant data analysis issues that are commonly encountered among practitioners, some of which need wider airing. The issues addressed include experimental design, quality assessment, normalization, and summarization of multiple-probe data. This article is based on the ISMB 2008 tutorial on microarray data analysis. An expanded version of the material in this article and the slides from the tutorial can be found at http://www.people.vcu.edu/~mreimers/OGMDA/index.html

Public Library of Science (PLOS)

VCU Scholars Compass

SNPs Occur in Regions with Less Genomic Sequence Conservation

Author: A Grimson
A Siepel
BP Lewis
CF Baer
CM Wade
D Chasman
E Pennisi
ES Lander
FW Allendorf
GE Crooks
H Zhang
Ilya Ruvinsky
J Stapley
JC Venter
John C. Castle
JV Chamary
K Chen
L Cartegni
M Lynch
M Stratton
MA Saunders
MP Miller
PA Morin
RH Waterston
RM Durbin
RM Kuhn
ST Sherry
V Matys
WG Fairbrother
Publication venue: Public Library of Science
Publication date: 06/06/2011
Field of study

Rates of SNPs (single nucleotide polymorphisms) and cross-species genomic sequence conservation reflect intra- and inter-species variation, respectively. Here, I report SNP rates and genomic sequence conservation adjacent to mRNA processing regions and show that, as expected, more SNPs occur in less conserved regions and that functional regions have fewer SNPs. Results are confirmed using both mouse and human data. Regions include protein start codons, 3′ splice sites, 5′ splice sites, protein stop codons, predicted miRNA binding sites, and polyadenylation sites. Throughout, SNP rates are lower and conservation is higher at regulatory sites. Within coding regions, SNP rates are highest and conservation is lowest at codon position three and the fewest SNPs are found at codon position two, reflecting codon degeneracy for amino acid encoding. Exon splice sites show high conservation and very low SNP rates, reflecting both splicing signals and protein coding. Relaxed constraint on the codon third position is dramatically seen when separating exonic SNP rates based on intron phase. At polyadenylation sites, a peak of conservation and low SNP rate occurs from 30 to 17 nt preceding the site. This region is highly enriched for the sequence AAUAAA, reflecting the location of the conserved polyA signal. miRNA 3′ UTR target sites are predicted incorporating interspecies genomic sequence conservation; SNP rates are low in these sites, again showing fewer SNPs in conserved regions. Together, these results confirm that SNPs, reflecting recent genetic variation, occur more frequently in regions with less evolutionarily conservation

Comparison of Beta-value and M-value methods for quantifying methylation levels by microarray analysis

Author: BP Durbin
C O'Riain
C Thirlwell
CG Bell
Chiang-Ching Huang
CV Breton
D Grafodatskaya
DJ Weisenberger
EA Houseman
Illumina
Illumina
JG Herman
L Guo
L Shen
L Shi
L Shi
Lifang Hou
M Barnes
M Bibikova
M Bibikova
M Esteller
Nadereh Jafari
P Du
Pan Du
PW Laird
RA Irizarry
S Davis
Simon M Lin
SM Lin
Warren A Kibbe
Xiao Zhang
Publication venue: BioMed Central
Publication date: 01/01/2010
Field of study

"Hook"-calibration of GeneChip-microarrays: Theory and algorithm

Author: A Halperin
Affymetrix
Affymetrix
Affymetrix
Affymetrix
BP Durbin
C Li
CJ Burden
CJ Burden
D Hekstra
DC Hoyle
E Carlon
F Naef
GA Held
GA Held
H Binder
H Binder
H Binder
H Binder
H Binder
H Binder
H Binder
H Binder
H Binder
Hans Binder
L Zhang
M Havilio
N Sugimoto
RA Irizarry
Stephan Preibisch
T Heim
T Lu
W Huber
WH Press
Z Wu
Z Wu
Z Wu
Publication venue: BioMed Central
Publication date: 01/01/2008
Field of study

Abstract Background: The improvement of microarray calibration methods is an essential prerequisite for quantitative expression analysis. This issue requires the formulation of an appropriate model describing the basic relationship between the probe intensity and the specific transcript concentration in a complex environment of competing interactions, the estimation of the magnitude these effects and their correction using the intensity information of a given chip and, finally the development of practicable algorithms which judge the quality of a particular hybridization and estimate the expression degree from the intensity values. Results: We present the so-called hook-calibration method which co-processes the log-difference (delta) and -sum (sigma) of the perfect match (PM) and mismatch (MM) probe-intensities. The MM probes are utilized as an internal reference which is subjected to the same hybridization law as the PM, however with modified characteristics. After sequence-specific affinity correction the method fits the Langmuir-adsorption model to the smoothed delta-versus-sigma plot. The geometrical dimensions of this so-called hook-curve characterize the particular hybridization in terms of simple geometric parameters which provide information about the mean non-specific background intensity, the saturation value, the mean PM/MM-sensitivity gain and the fraction of absent probes. This graphical summary spans a metrics system for expression estimates in natural units such as the mean binding constants and the occupancy of the probe spots. The method is single-chip based, i.e. it separately uses the intensities for each selected chip. Conclusion: The hook-method corrects the raw intensities for the non-specific background hybridization in a sequence-specific manner, for the potential saturation of the probe-spots with bound transcripts and for the sequence-specific binding of specific transcripts. The obtained chip characteristics in combination with the sensitivity corrected probe-intensity values provide expression estimates scaled in natural units which are given by the binding constants of the particular hybridization.</p